Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

نویسندگان

چکیده

Reinforcement learning is widely used in applications where one needs to perform sequential decisions while interacting with the environment. The problem becomes more challenging when decision requirement includes satisfying some safety constraints. mathematically formulated as constrained Markov process (CMDP). In literature, various algorithms are available solve CMDP problems a model-free manner achieve epsilon-optimal cumulative reward epsilon feasible policies. An epsilon-feasible policy implies that it suffers from constraint violation. important question here whether we can zero violations or not. To that, advocate use of randomized primal-dual approach and propose conservative stochastic algorithm (CSPDA) which shown exhibit O(1/epsilon^2) sample complexity violations. prior works, best for violation O(1/epsilon^5). Hence, proposed provides significant improvement compared state art.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning

Constrained Markov Decision Process (CMDP) is a natural framework for reinforcement learning tasks with safety constraints, where agents learn a policy that maximizes the long-term reward while satisfying the constraints on the long-term cost. A canonical approach for solving CMDPs is the primal-dual method which updates parameters in primal and dual spaces in turn. Existing methods for CMDPs o...

متن کامل

Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

We study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods use small storage and has low computational complexity p...

متن کامل

Primal - dual Strategy for Constrained Optimal

An algorithm for eecient solution of control constrained optimal control problems is proposed and analyzed. It is based on an active set strategy involving primal as well as dual variables. For discretized problems suucient conditions for convergence in nitely many iterations are given. Numerical examples are given and the role of strict complementarity condition is discussed. 1. Introduction a...

متن کامل

Primal–dual Methods for Nonlinear Constrained Optimization

. . . If a function of several variables should be a maximum or minimum and there are between these variables one or several equations, then it will be suffice to add to the proposed function the functions that should be zero, each multiplied by an undetermined quantity, and then to look for the maximum and the minimum as if the variables were independent; the equation that one will find combin...

متن کامل

Asymptotic convergence of constrained primal-dual dynamics

This paper studies the asymptotic convergence properties of the primal-dual dynamics designed for solving constrained concave optimization problems using classical notions from stability analysis. We motivate the need for this study by providing an example that rules out the possibility of employing the invariance principle for hybrid automata to study asymptotic convergence. We understand the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i4.20281